Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce differentiation interface #2137

Merged
merged 37 commits into from
Mar 25, 2025

Conversation

SouthEndMusic
Copy link
Collaborator

Fixes #2134

@SouthEndMusic
Copy link
Collaborator Author

@gdalle it would be nice if you could have a look at 64091bc. This already works very nicely! We might be able to get rid of our direct SparseConnectivityTracer dependency, were it not that we (I) did some naughty overloads using GradientTracer. What would the next step towards Enzyme support look like?

@SouthEndMusic
Copy link
Collaborator Author

When testing dense Jacobian + ForwardDiff I get an error like this:

ERROR: LoadError: Invalid Tag object: ...
Observed ForwardDiff.Tag{...}

I wonder whether that has something to do with passing the AD backend to both DifferentiationInterface.jacobian and the algorithm constructor. I also wonder how this works for algorithms that require derivatives .w.r.t. time.

@gdalle
Copy link

gdalle commented Mar 10, 2025

Taking a look now

@@ -28,6 +28,33 @@ struct Model{T}
end
end

function get_jac_eval(du::Vector, u::Vector, p::Parameters, solver::Solver)
backend = if solver.autodiff
AutoForwardDiff()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want to configure the tag here if it is available from somewhere else (perhaps the solver?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solver is our own solver config object, but it works if I consistently specify the same tag everywhere 👍

core/src/util.jl Outdated
p.all_nodes_active = false
jac_prototype
end

# Custom overloads
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to get rid of the SCT dependency, you may want to

  • put those overloads in an SCT package extension
  • ask the user to provide an AD backend, and if it is an AutoSparse, retrieve its sparsity_detector instead of providing your own

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this probably doesn't fit our application as described in #2137 (comment). Keeping the SCT dependency is fine

# Custom overloads
reduction_factor(x::GradientTracer, threshold::Real) = x
reduction_factor(x::GradientTracer, ::Real) = x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is explicitly discouraged in the SCT docs: see the following page to add overloads properly

https://adrianhill.de/SparseConnectivityTracer.jl/stable/internals/adding_overloads/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I know 🙃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these functions have more than 2 arguments, but for all of them we only care about the derivative with respect to one input. It's not clear to me whether/how that fits within the overload functionality

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to use our mechanisms. They will generate a couple of superfluous methods, but I don't see the harm in that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, this specific line of code looks harmless. But once you have more than a handful of overloads, you might want to create an SCT extension and follow our docs. The generated code will be more future proof and compatible with local and global Jacobian and Hessian tracers. Your current code only supports global Jacobians.


# Activate all nodes to catch all possible state dependencies
p.all_nodes_active = true
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)
prep = prepare_jacobian(water_balance!, du, backend, u, Constant(p), Constant(t))

See https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/explanation/advanced/#Contexts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs:

Another option would be creating a closure, but that is sometimes undesirable.

When is a closure undesirable?

gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))

In the first call, c is kept unchanged throughout the function evaluation. In the second call, c can be mutated with values computed during the function.

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SciMLStructures.jl come in here for more granular control?

Copy link

@gdalle gdalle Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is a closure undesirable?

With Enzyme in particular it can make things slower or even error. With other backends it doesn't make much of a difference, but explicitly laying out the Contexts also allows taking into account element types (eg. for handling translation to Dual with Caches).

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Does p contain anything whose value you care about?
In general, you might want to split it between a Constant part and a Cache part.

Should SciMLStructures.jl come in here for more granular control?

DI has no specific support for SciMLStructures

Copy link
Collaborator Author

@SouthEndMusic SouthEndMusic Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdalle I'm working on a refactor where e.g. the prep definition now looks like this:

prep = prepare_jacobian(
        water_balance!,
        du,
        backend,
        u,
        Constant(p_non_diff),
        Cache(p_diff),
        Constant(p_mutable),
        Constant(t),
    )

This now fails in the sparsity detection because of an attempt to write GradientTracer values to a Vector{Float64} field of p_diff::ParametersDiff. I made ParametersDiff parametric so ParametersDiff{GradientTracer{...}} can exist, and I half expected this to be constructed internally. This probably worked before because of PreallocationTools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just saw this warning in the docs:

Most backends require any Cache context to be an AbstractArray.

Let's see what I can do with that.

Copy link
Collaborator Author

@SouthEndMusic SouthEndMusic Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like that quickly solves the problem. I just naively subtyped ParametersDiff{T} <: AbstractVector{T}. Maybe I need to overload some methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a tricky one and it's indeed on me. If you don't want to wait for a DI fix (ETA ~ days), a short-term solution would be to use PreallocationTools and a closure, even if it makes Enzyme angry.

Comment on lines 51 to 53
jac =
(J, u, p, t) ->
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jac =
(J, u, p, t) ->
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)
jac(J, u, p, t) = jacobian!(water_balance!, du, J, prep, backend, u, Constant(p), Constant(t))

@SouthEndMusic SouthEndMusic marked this pull request as draft March 10, 2025 15:58
@gdalle
Copy link

gdalle commented Mar 13, 2025

Pass most tests

Love the confident commit naming. That's the spirit we wanna see

Comment on lines 50 to 51
diff_cache_SCT =
zeros(GradientTracer{IndexSetGradientPattern{Int64, BitSet}}, length(diff_cache))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR you can use SCT.jacobian_buffer instead, and with the update to DI I'll make once that is merged, you probably won't need any tweak at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@gdalle
Copy link

gdalle commented Mar 13, 2025

@SouthEndMusic can you take the branch from JuliaDiff/DifferentiationInterface.jl#739 for a spin, see if it works?

Warning

I am aware that using DI.Cache still gives rise to allocations during each Jacobian computation. You're the first person actually using it, so I plan to fix it, but first I want to know if it runs

@SouthEndMusic
Copy link
Collaborator Author

@gdalle I took the main branch, and it indeed works 👍

@gdalle
Copy link

gdalle commented Mar 14, 2025

@SouthEndMusic with the branch from JuliaDiff/DifferentiationInterface.jl#741 the Caches should be allocation-free after preparation.

@gdalle
Copy link

gdalle commented Mar 15, 2025

The changes have been released

@SouthEndMusic
Copy link
Collaborator Author

Some runtimes of the \hws_2024_7_0 model:

# FiniteDiff: 22.592886
# ForwardDiff: 20.455309
# ForwardDiff + type instability fix: 17.090883

@gdalle
Copy link

gdalle commented Mar 18, 2025

If the Jacobian is sparse, can you try other orders inside the GreedyColoringAlgorithm? For instance, GreedyColoringAlgorithm(LargestFirst()) or GreedyColoringAlgorithm(RandomOrder(rng, seed))?

@SouthEndMusic
Copy link
Collaborator Author

If the Jacobian is sparse, can you try other orders inside the GreedyColoringAlgorithm? For instance, GreedyColoringAlgorithm(LargestFirst()) or GreedyColoringAlgorithm(RandomOrder(rng, seed))?

What can be the effect of this? A different number of rhs calls required to compute the Jacobian?

@gdalle
Copy link

gdalle commented Mar 18, 2025

Yes, it influences the number of different colors with which the columns of the Jacobian are colored, and one color equals one function call (not exactly true with ForwardDiff though). This could accelerate the FiniteDiff version significantly if the natural coloring was suboptimal.

@gdalle
Copy link

gdalle commented Mar 18, 2025

You can call SparseMatrixColorings.ncolors(prep) on the Jacobian preparation result to see how many colors you have. For a dense Jacobian, this is equal to the total number of columns (input dimension). And the rule is that the cost of the sparse Jacobian scales with the number of colors (or the number of colors divided by 12 for ForwardDiff).

@SouthEndMusic
Copy link
Collaborator Author

LargestFirst() doesn't seem to have a significant effect

@gdalle
Copy link

gdalle commented Mar 18, 2025

Heads up, DI v0.6.46 supports nested tuples and named tuples of arrays as caches, at least for the most common backends

@SouthEndMusic SouthEndMusic marked this pull request as draft March 19, 2025 09:32
@SouthEndMusic
Copy link
Collaborator Author

Heads up, DI v0.6.46 supports nested tuples and named tuples of arrays as caches, at least for the most common backends

This is really nice, quite a bit of complexity can be removed now 👍

@SouthEndMusic SouthEndMusic marked this pull request as ready for review March 19, 2025 10:56
Copy link
Member

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @SouthEndMusic. And thanks for all the help @gdalle.

I left some minor comments, but let's get this in soon, we can always refine later.

@assert flow_rate_update.name == :flow_rate
flow_rate_ = minimum(flow_rate_update.value.u)

if flow_rate_ < 0.0
errors = true
control_state = key[2]
@error "$id_controlled flow rates must be non-negative, found $flow_rate_ for control state '$control_state'."
@error "Negative flow rate(s) for $id_controlled, control state '$control_state' found."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@error "Negative flow rate(s) for $id_controlled, control state '$control_state' found."
@error "Negative flow rate(s) found." node_id=id_controlled control_state

visr and others added 5 commits March 21, 2025 13:00
This adds a function `model.to_fews(region_dir)` that converts the
network and results to files that Delft-FEWS can directly handle. It is
marked as experimental for now.

@gijsber is working on a Delft-FEWS configuration that can be used to
visualize model results, to complement our existing tools. We'll likely
add this configuration to this monorepo since it is generic. #2159 also
pertains to this work.

What is especially nice is the spatio-temporal support of Delft-FEWS, so
we can make visualizations like this:


![image](https://github.com/user-attachments/assets/2e61bf82-0d7d-4558-a645-755d7e763b74)

In theory we can support similar functionality with QGIS, but looking at
the plots in #1369 this would
likely need work in QGIS itself. So this is really a quick win to be
able to inspect models better.

---------

Co-authored-by: Maarten Pronk <git@evetion.nl>
@SouthEndMusic SouthEndMusic merged commit a76ab8b into main Mar 25, 2025
19 of 20 checks passed
@SouthEndMusic SouthEndMusic deleted the introduce_differentiation_interface branch March 25, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce DifferentiationInterface.jl for Jacobian computation
4 participants